Creating a Lexicon of Bavarian Dialect by Means of Facebook Language Data and Crowdsourcing

نویسندگان

  • Manuel Burghardt
  • Daniel Granvogl
  • Christian Wolff
چکیده

Data acquisition in dialectology is typically a tedious task, as dialect samples of spoken language have to be collected via questionnaires or interviews. In this article, we suggest to use the “web as a corpus” approach for dialectology. We present a case study that demonstrates how authentic language data for the Bavarian dialect (ISO 639-3:bar) can be collected automatically from the social network Facebook. We also show that Facebook can be used effectively as a crowdsourcing platform, where users are willing to translate dialect words collaboratively in order to create a common lexicon of their Bavarian dialect. Key insights from the case study are summarized as “lessons learned”, together with suggestions for future enhancements of the lexicon creation approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

A Supervised Method for Constructing Sentiment Lexicon in Persian Language

Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...

متن کامل

Effective Factors on Naming Practices in Iran: Sociopolitics or Dialect?

Naming as an inseparable sign of a country’s language has attracted the attention of many linguists to formulate and test hypotheses regarding the culture and language of the people of a certain area. Iran appears like a proper destination for conducting a research focusing on naming based on several factors such as geography or chronology. The present article aims to take a specific look at th...

متن کامل

Design and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words

This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...

متن کامل

A Study of Inflectional Categories of Noun in Sistani Dialect

The present article aims to provide a synchronic study of the inflectional or morpho-syntactic categories of noun in Sistani dialect. These categories comprise person, number, gender or noun class, definiteness, case, and possession. Linguistic data was collected via recording free speech, and interviewing with 30 (15 females, 15 males) illiterate Sistani language consultants of age 40–102 year...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016